INDEX

Section: Misc. Reference Manual Pages (HUM)
Updated: Version 3.7
Index Return to Main Contents

The following programs are available for Humanities users:

accent         user-controlled accent module
cedilla        convert plus mark to cedilla
cfreq          character (or digraph) frequency count
dict           split file into dictionary sections
exclude        exclusion module for concordance
format         format and count keywords in concordance
freq           word frequency count
kwal           key word and line concordance
kwic           key word in context concordance
lemma          user-controlled lemmatization module *
lno            line number (also double lines, hemistichs, strophes)
maxwd          locate, measure and print longest word (or line)
pair           set two files side by side (or merge lines)
pause          stop terminal output to change type ball
revconc        reverse concordance module
sfind          find sentence (or record) matching a pattern
skel           prompt user with database skeleton *
togrk          convert Greek transcription for typesetter *
tolpr          filter output for lineprinter
tosel          convert English for Selectric terminal *
tprep          prepare text for concordance (trim, pad, or unpad)
troffmt        format concordance for typesetter *
umlaut         convert plus mark to umlaut
wdlen          tabulate word lengths and print histogram
wheel          roll through text a word cluster at a time
xref           cross reference words and linenumbers

               * not widely distributed, but available on request

To get the manual pages for any of these programs, type:

               % human  programname

@@@ Fin de man/INDEX.man echo man/Makefile cat >man/Makefile <<'@@@ Fin de man/Makefile' MAN = ../man/

all: $(MAN)index $(MAN)accent $(MAN)cfreq $(MAN)dict $(MAN)exclude $(MAN)format $(MAN)freq $(MAN)kwal $(MAN)kwic $(MAN)lno $(MAN)maxwd $(MAN)pair $(MAN)pause $(MAN)revconc $(MAN)skel $(MAN)sfind $(MAN)tolpr $(MAN)tosel $(MAN)tprep $(MAN)wdlen $(MAN)wheel $(MAN)xref

$(MAN)index: INDEX.man                nroff -man INDEX.man > $(MAN)index
$(MAN)accent: accent.man                nroff -man accent.man > $(MAN)accent
               ln $(MAN)accent $(MAN)cedilla
               ln $(MAN)accent $(MAN)umlaut
$(MAN)cfreq: cfreq.man                nroff -man cfreq.man > $(MAN)cfreq
$(MAN)dict: dict.man                nroff -man dict.man > $(MAN)dict
$(MAN)exclude: exclude.man                nroff -man exclude.man > $(MAN)exclude
$(MAN)format: format.man                nroff -man format.man > $(MAN)format
$(MAN)freq: freq.man                nroff -man freq.man > $(MAN)freq
$(MAN)kwal: kwal.man                nroff -man kwal.man > $(MAN)kwal
$(MAN)kwic: kwic.man                nroff -man kwic.man > $(MAN)kwic
$(MAN)lno: lno.man                nroff -man lno.man > $(MAN)lno
$(MAN)maxwd: maxwd.man                nroff -man maxwd.man > $(MAN)maxwd
$(MAN)pair: pair.man                nroff -man pair.man > $(MAN)pair
$(MAN)pause: pause.man                nroff -man pause.man > $(MAN)pause
$(MAN)revconc: revconc.man                nroff -man revconc.man > $(MAN)revconc
$(MAN)sfind: sfind.man                nroff -man sfind.man > $(MAN)sfind
$(MAN)skel: skel.man                nroff -man skel.man > $(MAN)skel
$(MAN)tolpr: tolpr.man                nroff -man tolpr.man > $(MAN)tolpr
$(MAN)tosel: tosel.man                nroff -man tosel.man > $(MAN)tosel
$(MAN)tprep: tprep.man                nroff -man tprep.man > $(MAN)tprep
$(MAN)wdlen: wdlen.man                nroff -man wdlen.man > $(MAN)wdlen
$(MAN)wheel: wheel.man                nroff -man wheel.man > $(MAN)wheel
$(MAN)xref: xref.man                nroff -man xref.man > $(MAN)xref
@@@ Fin de man/Makefile echo man/README cat >man/README <<'@@@ Fin de man/README'                This file contains nroff/troff text for use with the -man
macro package (version 7 Unix only). To get printable manual pages in the directory "../man", where users can read them by using the "human" command, just use the "make" command in this directory. This is very similar to the "make" for the C source code, except that this "Makefile" calls "nroff -man". @@@ Fin de man/README echo man/accent.man cat >man/accent.man <<'@@@ Fin de man/accent.man'

NAME

accent - user-controlled accent module
cedilla - convert plus mark to cedilla
umlaut - convert plus mark to umlaut

SYNOPSIS

accent  [ -a  accfile ]  [ filename ... ]
cedilla  [ filename ... ]
umlaut  [ filename ... ]

DESCRIPTION

Accent reads accent mark definitions from ``accfile'', or some other file specified after the -a option. This file should have one or more lines of a character, some space, and another character. Accent converts all characters on the left into a backspace and the corresponding character on the right. When using accent, create a punctuation file (to be specified after the -d option of kwic) containing all your left-hand accent marks on its second line. Kwic will consider these as zero-width characters.

For convenience, two links to accent are provided: cedilla and umlaut. Cedilla converts the plus mark to a backspace and comma, which looks like a cedilla on the lineprinter; this convention is used even on the phototypesetter. Umlaut converts the plus mark to a backspace and double quote; this passes for an umlaut on unsophisticated output devices. The plus mark should follow the character that is to be accented; for example, type "Provenc+al" or "Mu+llerin" in your text. The use of a plus character to represent both accent marks implies that you cannot have cedillas and umlauts in the same text, unless you use the accent program.

Be sure to use the + option of kwic or kwal, or have a second line of zero-width characters in the punctuation file, in order to create the proper character alignment. Accent, cedilla, or umlaut should be called after sort, but before format. If you are using these programs, invoke the -d flag of sort to prevent accent marks from influencing dictionary order.

AUTHOR

Bill Tuthill

BUGS

Accent should probably be able to convert a character into an arbitrary-length string. @@@ Fin de man/accent.man echo man/cfreq.man cat >man/cfreq.man <<'@@@ Fin de man/cfreq.man'

NAME

cfreq - character (or digraph) frequency count

SYNOPSIS

cfreq  [ -p -a -d -m - ]  filename ...
-p: list all printable characters (blank - '~')
-a: list all ascii characters (null - delete)
-d: count digraphs rather than single characters
-m: disable mapping of digraphs to lower case
- : read standard input instead of files

DESCRIPTION

Cfreq reads through a list of files, counting the number of occurrences of each ascii character. The counts are kept in an internal table. When reading is finished, the frequencies are listed, with the character (in ascii order) on the left, and its frequency on the right. A count of the total number of characters (including newlines) appears at the bottom of the listing. The output can be formatted into multiple columns with the Unix utility pr, if desired.

Ordinarily, only alphabetic characters are listed. The -p option, however, gives all printable characters, including letters, digits, and punctuation marks. The -a option gives all 128 ascii characters, including control characters, letters, digits, and punctuation marks.

When given the -d flag, cfreq counts digraph frequencies. Spaces, tabs and newlines are considered valid characters, as are punctuation marks, digits, and control characters. When reading is finished, the digraphs are listed on the left, with the frequency counts on the right. If the -m flag is also invoked, cfreq will not map alphabetic characters to lower case, so you will end up with capitals among the digraphs.

AUTHOR

Bill Tuthill

BUGS

@@@ Fin de man/cfreq.man echo man/dict.man cat >man/dict.man <<'@@@ Fin de man/dict.man'

NAME

dict - split file into dictionary sections

SYNOPSIS

dict  [ - ]  filename  [ outfileroot ]
-: read standard input rather than file

DESCRIPTION

Dict will divide a text into multiple files, according to the first letter on each line. It can be used to split up a large concordance into smaller, more manageable dictionary sections. This program is akin to the Unix utility split, which divides files into 1000 line portions. Dict reads from the file given in the first argument (or standard input if the first argument is `-'), and writes onto a set of output files, all beginning with the root given in the second argument. If no second argument is given, the root defaults to "X". For every file that is created, a character is added to the root, to indicate what letter that file contains.

Theoretically, it is possible to write 128 different files, one for each ascii character. This means that each number goes into its own file, and that an upper case "A" and a lower case "a" will end up in different files. In the case of the kwic program, all keywords are already mapped to lower case, so there should be 26 or fewer files. Here is an example of a concordance program using kwic:

 % kwic text* | sort | dict - /tmp/OUT
 % edit /tmp/OUT*
 % format /tmp/OUT* | lpr
 % rm /tmp/OUT*

In the above example, dict makes small files out of one large file, so that you can edit the concordance until you are happy with it. The best and most useful concordances are always hand-edited.

AUTHOR

Bill Tuthill

BUGS

@@@ Fin de man/dict.man echo man/exclude.man cat >man/exclude.man <<'@@@ Fin de man/exclude.man'

NAME

exclude - exclusion module for concordance

SYNOPSIS

exclude [ -i ignorefile ] [ -o onlyfile ] [ filename ... ]

-i: ignorefile contains words to be ignored, one per line
-o: onlyfile has only words to be printed, one per line

DESCRIPTION

Exclude is a filter that functions as an exclusion routine for deleting unnecessary words from a concordance. When invoked without any arguments, it reads standard input, and writes to standard output, filtering out all lines beginning with words listed in ``exclfile''. If any filenames of text files are given, exclude will read from them rather than from standard input.

Ordinarily, words to be ignored are read from ``exclfile'', but another ignore file can be specified after the -i option. (There is a list of common English words in /usr/lib/eign.) If you wish to preserve only a small set of words, and want all other words ignored, you can list these important words in the only file, and use the -o option; only words listed in that file will be sent through the filter. Words listed in the exclude file must be on a line of their own, with no blanks anywhere on the line.

Exclude should be used after kwic or kwal, but before sort, because eliminating unnecessary words before sorting will save large amounts of otherwise redundant machine time. Here is a sample command line using the exclusion routine:

 % kwic  textfile | exclude | sort | format

Of course, it is necessary to have words to be excluded in a file called ``exclfile'', residing in the same directory as ``textfile''. Eliminating prepositions and articles from a concordance can often shorten it by as much as one-third to one-half.

AUTHOR

Bill Tuthill

BUGS

There cannot be more than 500 lines in the exclude file. @@@ Fin de man/exclude.man echo man/format.man cat >man/format.man <<'@@@ Fin de man/format.man'

NAME

format - format and count keywords in concordance

SYNOPSIS

format  [ -mck ]  [ filename ... ]  [ - ]
-m: keywords not mapped from lower to upper case
-c: suppress counting of keywords (will speed it up)
-k: suppress printing of separate keyword
- : read standard input instead of files

DESCRIPTION

Format is generally the last program used in making a concordance. Once the concordance has been compiled and sorted, using kwic or kwal and sort, the keywords can be formatted into capitalized headings followed by a frequency count. Format depends on sorted input to make its frequency counts.

If for some reason you do not want an upper case keyword heading, you can preserve the lower case keywords by using the -m option. Keyword counting can also be suppressed by using the -c option; this will speed up the format program somewhat. To completely suppress printing of a separate keyword, use the -k option; this will produce only the identification field and the context.

Here is a typical program sequence for a concordance, suitable for sending to the lineprinter:

 % kwic -c100 filename(s) | sort | format | lpr

The -c100 argument to kwic creates a long context suitable for the lineprinter.

FILES

Format creates a temporary file, /tmp/Fmt?????, where it stores all the contexts of a single keyword, while counting the frequency of that keyword. This tempfile is removed in case of interrupt.

AUTHOR

Bill Tuthill

BUGS

The -k option silently overrides the -m and -c options. @@@ Fin de man/format.man echo man/freq.man cat >man/freq.man <<'@@@ Fin de man/freq.man'

NAME

freq - word frequency count

SYNOPSIS

freq  [ -n  -m  -dpfile  - ]  filename ...
-n: list words in numerical order of frequency
-m: disable mapping of letters to lower case
-d: define punctuation set according to pfile
- : read standard input instead of files

DESCRIPTION

Freq reads through a list of files, counting the number of occurrences of each word. The frequencies and words are kept in a binary tree structure in core memory, so that the program will be as efficient as possible. When reading is finished, the frequencies are listed on the left, with the words, in alphabetical order, on the right. The total number of words, and the number of different words, is tabulated and given at the end of the wordlist. The output can be formatted into columns with pr, if desired.

The -n option will cause the words to be listed by numerical order of frequency, with the most common words first. The -m flag will leave capital letters as they are. The -d option allows the user to define his own punctuation set. If this option is called, freq will replace the default punctuation set ,.;:-?!"()[]{} with the last line of the specified file.

AUTHOR

Bill Tuthill

BUGS

Freq will run out of core memory at about 64K bytes of storage. In that case it is necessary to use prep, sort and uniq, which is a much slower process, but which can handle large amounts of data. @@@ Fin de man/freq.man echo man/kwal.man cat >man/kwal.man <<'@@@ Fin de man/kwal.man'

NAME

kwal - key word and line concordance

SYNOPSIS

kwal  [ -kn -m -wS -fn -sn -r -ln -x -dF + - ]  filename ...
-kn: keyword is n characters long (defaults to 15)
-m : keywords not mapped from upper to lower case
-wS: write string S onto id field (use quotes around blanks)
-fn: filename (up to n characters) written onto id field
-sn: skip n characters of lefthand id field in text and write as id
-r : reset linenumber to 1 at beginning of every file
-ln: line numbering begins with line n (instead of 1)
-x : line numbering is suppressed entirely
-d : define punctuation set according to file F
+ : the + character indicates cedilla or umlaut
- : read text from standard input (terminal or pipe)

DESCRIPTION

Kwal is a text concordance program, generally for use with poetry. Normally, it prints a left-hand keyword (adjusted for backspaces), a 6 digit linenumber, and the line of context. The following characters are considered to be punctuation marks: ,.;:-"?!()[]{} but all other non-alphabetic characters can be part of a word. These punctuation characters can be changed.

By default, only the first 15 characters of the keyword are printed, followed by a vertical bar; longer keywords are truncated. If you want more or less than 15 characters in the keyword, use the -k option to lengthen or shorten it. To find the longest word in your text, try the maxwd program, and set -k accordingly. You can also use maxwd -l to determine the length of your longest context line. Keywords are mapped to lower case to ease the logistics of sorting, unless the -m option is specified.

The -w argument allows you to write an id field (such as the name of an author or work) after the keyword. If you want to include any blanks, enclose the entire string in quotes: -w"Poetic Edda". The -f argument allows you to write the current filename, up to a number of characters you specify. If the filename is shorter, it will be blank-padded, and if it is longer, it will be truncated.

If you are concording a series of short poems, each starting with line 1, type them into separate files, and use the -r option to reset the linenumber to 1 at the beginning of each new file. If you resume concording in the middle of your text, you can set the line number with the -l option. If your text is already numbered or identified, with a system that is not entirely arithmetic, such as by hemistich or by double lines, you can print your custom id field by using the -s option. This will skip over n characters of your lefthand id field embedded in the text, and print it as an id field, after the (-f) filename, but before the (-l) linenumber. When you also want to suppress linenumbering, use the -x option.

If you are working with a foreign language, and need to use normal punctuation marks as diacritical marks, you can change the default punctuation set with the -d option. Just type the punctuation marks you want into a file, on a single line with no embedded spaces, and specify the filename after the -d in your command line. If you have cedillas or umlauts, you can represent them as a `+' character after the accented letter. Use the `+' option of kwic, and filter your output through either the cedilla or umlaut program.

After generating the concordance, it should be alphabetized using the Unix sort program. Keywords should be grouped and counted with the format program, and the final results can be sent to the lineprinter. Here is a typical program sequence for generating a concordance:

 % kwal poem* | sort | format | lpr

Usually, it is better to send the results of format to a file, where they can be examined and edited, before sending the file to the lineprinter.

LIMITATIONS

Lines of text cannot be longer than 512 characters. Linenumbers cannot exceed 999999 without skewing the output format. Most lineprinters will not print entries longer than 132 characters.

AUTHOR

Bill Tuthill

BUGS

@@@ Fin de man/kwal.man echo man/kwic.man cat >man/kwic.man <<'@@@ Fin de man/kwic.man'

NAME

kwic - key word in context concordance

SYNOPSIS

kwic  [ -kn -m -wS -fn -r -ln -pn -ic -cn -dF + - ]  filename ...
-kn: keyword is n characters long (defaults to 15)
-m : keywords not mapped from upper to lower case
-wS: write string S onto id field (use quotes around blanks)
-fn: filename (up to n characters) written onto id field
-r : reset linenumber to 1 at beginning of every file
-ln: line numbering begins with line n (instead of 1)
-pn: page numbering begins with page n (instead of 1)
-ic: page incrementer is character c (defaults to =)
-cn: context is n characters long (defaults to 50)
-dF: define punctuation set according to file F
+ : the + character indicates cedilla or umlaut
- : read text from standard input (terminal or pipe)

DESCRIPTION

Kwic is a text concordance program, generally for use with prose, although it is often used for poetry. Normally, it prints a left-hand keyword, a 6 digit linenumber or 6 place pagenumber (depending on how you want to label your text), and a context of 50 characters, centered around the keyword. Words are separated at their natural boundaries, and adjustment is made for backspaces. Newline characters are printed as "/", and tabs are printed as a single blank. If you want to have a space after the newline "/", use the pad option of tprep to insert a space at the beginning of each line in your text. The following characters are considered to be punctuation marks: ,.;:-"?!()[]{} but all other non-alphabetic characters can be part of a word. These punctuation characters can be changed.

By default, only the first 15 characters of the keyword are printed, followed by a vertical bar; longer keywords are truncated. If you want more or less than 15 characters in the keyword, use the -k option to lengthen or shorten it. To find the longest word in your text, use the maxwd program, and set -k accordingly. Keywords are mapped to lower case to ease the logistics of sorting, unless the -m option is specified.

The -w argument allows you to write an id field (such as the name of an author or work) after the keyword. If you want to include any blanks, enclose the entire string in quotes: -w"Prose Edda". The -f argument allows you to write the current filename, up to a number of characters you specify. If the filename is shorter, it will be blank-padded, and if it is longer, it will be truncated.

If the program encounters the character "=", which, by default, indicates pagination, it will count pages as well as line numbers. Line numbers will print as: `` 12469'', while page numbers will print as: ``178,12''. If you are concording a series of short poems, each starting with line 1, type them into separate files, and use the -r option to reset the linenumber to 1 at the beginning of each new file. If you resume concording in the middle of your text, you can set the line number with the -l option, or the page number with the -p option. If you want to indicate pagination, make sure that you begin your text with ``=1'', on a line of its own, to indicate the first page. When a new chapter starts at the top of the page, be sure to set -p to the previous page. The page indicator can be changed with the -i option; -i% will change it to a percent sign, for instance.

If you are sending output to the lineprinter, the context width can be increased with the -c argument; -c110, for instance, will give you about 55 characters on either side of the keyword in context. Note that the lineprinter can print only 132 characters per line, so add up your field widths carefully.

 % kwic -c110 chapter* | sort | format | lpr

Usually, it is better to send the results of FORMAT to a file, where they can be examined and edited, before sending the file to the lineprinter.

FILES

A temporary file, /tmp/KwicXXXXX, is created if kwic has to work with standard input, because seeking can only be done with files.

LIMITATIONS

Words cannot be longer than 512 characters, nor can the first half of the context. Linenumbers cannot exceed 999999 and pagenumbers cannot exceed 999,99 without skewing the output format. Most lineprinters will not print entries longer than 132 characters, and the CAT/4 typesetter cannot handle lines longer than 7.54 inches.

AUTHOR

Bill Tuthill

BUGS

If there are lots of backspaces in the text, the context width is somewhat shortened. Using a wheel-like data structure might be more efficient than using disk seeks and reads to output the contexts. @@@ Fin de man/kwic.man echo man/lno.man cat >man/lno.man <<'@@@ Fin de man/lno.man'

NAME

lno - line number, double line number, hemistich number

SYNOPSIS

lno  [ +n  -d  -h  -sn  - ]  filename ...
+n : the beginning line number is n, not 1
-d : double line number text with long lines
-h : hemistich number text with split lines
-sn: number and letter strophes of n lines
-  : read standard input instead of files

DESCRIPTION

Lno line numbers a text, starting at 1 (one) and going up. If you are beginning in the middle of a text, the initial line number can be specified after the + option.

With the -d option, lno numbers a text with long (Germanic) lines, which are generally labelled in editions as double lines. It starts at 1 (one), or at the specified line number, and goes up in increments of two at the end of each line.

With the -h option, lno numbers a text with hemistichs, or half lines. It starts at 1 (one), unless another beginning number is specified. The first line is labelled 1a, the second 1b, the third 2a, the fourth 2b, and so forth.

The -s option can be used to specify the number of lines in a strophe. For example, -s4 will produce 1a, 1b, 1c, 1d, 2a, and so on. The -h option is identical to the -s2 argument.

With the -d option, if you specify an even beginning number, all the following double line numbers will be even. With the -h option, all line pairs have a number postfixed with "a" and then "b", so if you want to begin with a "b", put an empty line in your text, to be labelled "a".

AUTHOR

Bill Tuthill

BUGS

@@@ Fin de man/lno.man echo man/maxwd.man cat >man/maxwd.man <<'@@@ Fin de man/maxwd.man'

NAME

maxwd - locate, measure and print longest word (or line)

SYNOPSIS

maxwd  [ -l  -dF  - ]  filename ...
-l: look for longest line instead of longest word
-d: define punctuation set according to file F
- : read standard input instead of files

DESCRIPTION

Maxwd reads through a set of files, or standard input if specified, looking for the longest word. After reading is finished, the filename and linenumber of the longest word are printed, with the length of that word. On the next line, the longest word is printed verbatim.

With the -l option, maxwd will look for the longest line, and print its filename, linenumber, and length. Similarly, the next line will contain the longest line, verbatim.

If several files are concatenated and sent through a pipe to maxwd, the filename will appear as "Stdin" and line numbering will continue to increment across file boundaries.

Maxwd should be used before concording a text with kwic or kwal, in order to determine what keyword length you should specify. If you are working with foreign languages, the -d option can be used to split words at the proper place; the punctuation file is compatible with many other related programs.

AUTHOR

Bill Tuthill

BUGS

Maxwd will truncate words longer than 512 characters, and Maxwd -l will truncate lines longer than 1024 characters. @@@ Fin de man/maxwd.man echo man/pair.man cat >man/pair.man <<'@@@ Fin de man/pair.man'

NAME

pair - set two files side by side (or merge lines)

SYNOPSIS

pair  [ -m ]  file1  [ - ]  file2  [ +len1  [ +len2 ] ]
-m: merge (intercalate) files line by line
- : read standard input instead of files
len1 and len2 denote screen width of file1 and file2

DESCRIPTION

Pair is a program for looking at two parallel texts, in order to compare and contrast them. By default, pair sets them side by side, but with the -m option, it shuffles them together. This utility is useful for examining manuscript variations. It will accept standard input rather than a file, if a dash is used in place of the filename. Output can be redirected if desired.

By default, pair prints two 40-character wide columns of text, which gives equal space to each text, and fills up the screen. The third and fourth arguments can be used to change the column width for the first and second files, respectively. For example, if your first file is composed of numbers but your second file contains text with occasional long lines, specify something like:

 %  pair  file1  file2  +10  +70

If you have long lines and would rather have lines from each text on separate lines, use the -m option.

Pair can be used for comparing textual variants. It is especially useful for making two texts parallel before analyzing the variants with diff or diff3. Diff compares two files, while diff3 compares three at a time. The results from these programs will be more usable if the texts are parallel before they are analyzed.

AUTHOR

Bill Tuthill

BUGS

When output is redirected and input is being taken from the terminal, it is impossible to tell what is coming from the input file. @@@ Fin de man/pair.man echo man/pause.man cat >man/pause.man <<'@@@ Fin de man/pause.man'

NAME

pause - stop terminal output to change type ball

SYNOPSIS

pause [ filename ... ]

DESCRIPTION

Pause will stop terminal output when it encounters a control-p embedded in the text it is reading, and resume output when a control-d is typed on the terminal keyboard. Other than that, pause acts much like the Unix utility cat. It is intended for use on a Selectric terminal with an IBM ball, or on a DTC or IPSI terminal with a Diablo printwheel.

If your text proceeds in one language, and then changes to another for a quote, just put a ctrl-p in your text between sections. The terminal will pause until you change the printing device, and when you are ready to continue, you can type ctrl-d on the terminal.

AUTHOR

Bill Tuthill

BUGS

Control characters embedded in the text can affect the lineprinter, nroff and troff, and many other programs. So do not be indiscriminate with your use of control-p. @@@ Fin de man/pause.man echo man/revconc.man cat >man/revconc.man <<'@@@ Fin de man/revconc.man'

NAME

revconc - reverse concordance module

SYNOPSIS

revconc [ filename ... ]

DESCRIPTION

Revconc reverses the first word on each line, which in a concordance is, conveniently, the keyword. This program is intended to be a module to create a reverse concordance. Words will be alphabetized from the end to the beginning, rather than from the beginning to the end, as is usual. The results can be used to examine word endings and inflections.

It should be used between a series of pipes including kwic or kwal, sort, and format. Here is a suggested command sequence:

 % kwic filename(s) | revconc | sort | revconc | format

It must be used twice, or else the word will appear backwards in the final version. The first invocation of revconc reverses the keyword, so that sort operates from the back to the front, while the second invocation restores normal order to the word.

Many published concordances contain a Reverse List of Graphic Forms; revconc can be used for this purpose, but the Unix utility rev would probably be faster. Here is a suggested command sequence for making a Reverse List of Graphic Forms:

 % prep filename(s) | rev | sort -u | rev

The results can be put into columns with the Unix utility pr.

AUTHOR

Bill Tuthill

BUGS

It is not possible to make a reverse concordance using context, rather than line number, as the secondary sort field. @@@ Fin de man/revconc.man echo man/sfind.man cat >man/sfind.man <<'@@@ Fin de man/sfind.man'

NAME

sfind - find sentence matching a pattern

SYNOPSIS

sfind  [ -sc -ln -pn -ic -r ]  'pattern'  [ - ]  filename ...
-sC: record separator set to C (or empty line with no C)
-ln: line number is set to n (instead of 1)
-pn: page number is set to n (default off)
-ic: page incrementing character is c (not =)
-r : reset linenumber to 1 with each new file
-  : read standard input instead of files

DESCRIPTION

Sfind is a rewrite of the Unix utility grep, oriented towards sentences rather than towards lines. It is useful for finding words and syntactic patterns in their full linguistic context. If the pattern is longer than one word, or if it contains magic shell characters, it must be enclosed in quotes. You can specify multiple filenames, and sfind will search through them in order. If there is a match, it will print the current filename, the line number where the sentence begins, the page number if relevent, and the pattern, all on a single line. This information will be followed by the sentence exactly as it appears in the text.

The pattern wildcard character `_' (underscore) matches any single character; it is similar to the `.' (period) in grep, or the `?' (question mark) in the shell. The wildcard character `*' (asterisk) matches any number of characters in your text until the pattern continues; it is exactly like the `*' wildcard in the shell. It is also similar, but not identical, to the `*' in grep, which matches zero or more repetitions of the previous character. To find an actual underscore or asterisk, precede these metacharacters with a backslash.

If you begin searching in the middle of a text, you can set the beginning line number (or page number) with the -l (or -p) option. For compatibility with the page incrementing feature of kwic, sfind will count pages if it encounters `=' (equals) in the text. The incrementing character can be changed with the -i option. If you want to reset the linenumber to 1 at the beginning of each new file, use the -r option.

The -s option is for use with databases where records are separated by a record separator. This character can be specified after the -s, and the program will operate a record at a time, rather than a sentence at a time. If the record separator is a magic shell character, it will have to be quoted or escaped with a backslash. A -s alone indicates that records are separated by a blank line, as are records in refer bibliographies. It is similar to the -F option of awk.

AUTHOR

Bill Tuthill

BUGS

There is no equivalent in sfind to the [...], and [^...] metacharacters of grep. These would be extremely helpful. @@@ Fin de man/sfind.man echo man/skel.man cat >man/skel.man <<'@@@ Fin de man/skel.man'

NAME

skel - prompt user for database skeleton

SYNOPSIS

skel outfile

DESCRIPTION

Skel reads from a ``promptfile'' containing a skeleton outline of subjects in a database, prompts the user for data, and writes the outline and the data to the ``outfile''. The promptfile must have exactly that name, and reside in the working directory. Outfiles cannot be overwritten, to protect vital information.

It is possible to escape to a system editor from where you can easily correct mistakes, by giving a ``tilde escape'' on the data line. The tilde must be the first character on the input line. In the distributed program, ~v will escape to vi, and ~e will escape to ex; both editors are part of 2bsd and 4bsd (Berkeley Software Distribution). If you don't have these editors, simply change the code and recompile the program so it will work with ed, or you own favorite editor.

FILES

promptfile - file containing database skeleton

AUTHOR

Bill Tuthill

BUGS

@@@ Fin de man/skel.man echo man/tolpr.man cat >man/tolpr.man <<'@@@ Fin de man/tolpr.man'

NAME

tolpr - shift output for the lineprinter

SYNOPSIS

tolpr [ -2 ] [ -h "Header" ] [ -s ] [ filename ... ]

DESCRIPTION

Tolpr adds a tab at the beginning of every line, which moves your text away from the holes and used-up ribbon. If the first line in a file is non-blank, tolpr also prints page numbers, inserts three line header and footer margins, and saves widows for the top of the next page. If the first line in a file is blank, tolpr only shifts output to the right, on the assumption that the file is already paginated. Consequently, it can be used equally well with nroff, pr, and with concordances.

The -2 option will cause output to be double spaced; -3 will cause triple spacing, and so forth. This is a substitute for the .ls 2 of nroff/troff, or the .nr VS 24 of the -ms macros. The -h option is used to print a header at the top of each page; it only works if pagination is in effect. The -s flag suppresses the shifting to the right.

AUTHOR

Bill Tuthill

BUGS

Sometimes the first line in a file is blank, but the file is not pre-paginated; if this occurs, delete the blank line. @@@ Fin de man/tolpr.man echo man/tosel.man cat >man/tosel.man <<'@@@ Fin de man/tosel.man'

NAME

tosel - convert English for Selectric terminal

SYNOPSIS

tosel [ filename ... ]

DESCRIPTION

Tosel is intended for use with the Anderson-Jacobson terminal in the Humanities Computing Service. It will convert Unix files to character strings that print out properly when a regular typewriter ball is used in the AJ-841, instead of the ebcdic ball normally used in the machine.

The tosel program works much like the Unix utility cat. That is to say, it can be used to print out one or more files, or as a filter in a series of programs communicating by pipes. It can be used before or after pause, since it does nothing to Control-P.

AUTHOR

Bill Tuthill

BUGS

The following characters, since they do not exist on a standard typewriter ball, produce garbage output:

 <  >  |    ^

The circumflex character produces a blank space. These five characters are not rendered accurately:

 [  {  ]  }  `

They produce, in order, these five similar characters:

 (  (  )  )  '

Of course, various IBM balls will differ, and will cause further program bugs. The tosel program was written for the IBM "Pica 72" 10-pitch ball, but will probably work perfectly for any ball that has the characters `!' and `1', and 1/2 and 1/4. @@@ Fin de man/tosel.man echo man/tprep.man cat >man/tprep.man <<'@@@ Fin de man/tprep.man'

NAME

tprep - prepare text for concordance (trim, pad, or unpad)

SYNOPSIS

tprep  [ -y  -tpu ]  filename ...
-y: say yes and suppress interactive prompting
-t: trim lines, removing trailing blanks and tabs
-p: pad, inserting blank at beginning of each line
-u: unpad, deleting blank at beginning of each line

DESCRIPTION

Tprep is a semi-interactive text editor with specific application to preparing text for concordances. It is much faster than sed, and will work on far larger files than ex or ed. It provides limited facilities: trimming of trailing blanks or tabs, and padding and unpadding.

When typing in a text, it is practically impossible to avoid accidental spaces at the end of lines. These spurious blanks throw off the results of character counting, and are unsightly in a kwic-style concordance. Also, before compiling a kwic concordance, you may want to pad each line with a blank, so that the slash indicating newline is not followed too closely by the next word. After finishing the concordance, the padding can be removed, using the unpad option.

If you do not specify any options in the command line, you are prompted to make sure you want to rewrite your files. Then you are asked whether you want to use trim, pad or unpad. You can answer either with the full word, or with the first letter of these three words. Tprep also tells what files it is rewriting, and reports on the scope of the changes involved for each file.

FILES

Tprep makes changes to a file, and writes the results to /tmp/Prep?????; this file is then copied back on top of the original file.

AUTHOR

Bill Tuthill

BUGS

It is impossible to stop rewriting files once begun, because interrupts have been disabled, since vital information could otherwise be lost forever. Interrupts should probably halt the process after the next overwrite. @@@ Fin de man/tprep.man echo man/troffmt.man cat >man/troffmt.man <<'@@@ Fin de man/troffmt.man'

NAME

troffmt - format concordance for typesetter

SYNOPSIS

troffmt  [ -ckm ]  [ filename ... ]  [ - ]
-c: suppress counting of keyword frequency
-k: entirely suppress printing of keyword
-m: do not supply concordance macros automatically
- : read standard input instead of files

DESCRIPTION

Troffmt is a preprocessor for troff that replaces format when using the phototypesetter instead of the lineprinter. It builds its own macros, so it does not require the -ms package.

Keyword counting can be suppressed by using the -c option; this will speed up the program somewhat. To completely suppress printing of a separate keyword, use the -k option.

Here is a typical program sequence for a concordance, suitable for sending to the typesetter:

 % kwic -f5 -c80 filename(s) | sort | troffmt | troff -Q

The -c80 argument to kwic creates a context suitable for the typesetter. Anything larger may result in lines too long for the typesetter. If there is no -f or -w option, -c85 would be safe; with long -f or -w options, adjust -c accordingly.

FILES

Troffmt depends in /usr/lib/me/chars.me, or /usr/lib/mx/tmac.xacc, for accent mark definitions.

AUTHOR

Bill Tuthill

BUGS

On systems without either -me or -mx, accent marks are undefined. The -k option silently overrides the -c option. The -m flag does not have the same meaning as the -m flag in format. @@@ Fin de man/troffmt.man echo man/wdlen.man cat >man/wdlen.man <<'@@@ Fin de man/wdlen.man'

NAME

wdlen - tabulate word lengths and print histogram

SYNOPSIS

wdlen  [ -l  -dPfile  - ]  filename ...
-l: print long histogram suitable for lineprinter
-d: define punctuation set according to Pfile
- : read standard input instead of files

DESCRIPTION

Wdlen reads through a text, tabulating the frequencies of various word lengths. Then it prints out these frequencies, along with a horizontal bar graph of word length. Word length distribution is one of many stylistic traits that can be analyzed in a linguistic corpus.

If there are a great number of words in your text, the dashes in the bar graph do not have a one to one correspondence with the frequency count, but are calculated so that the longest bar fills up the screen. The length of the bar can be extended with the -l option.

If you are working with foreign languages, the -d option can be used to split words at the proper place; the ``Pfile'' is compatible with many other related programs.

AUTHOR

Bill Tuthill

BUGS

Words longer than 20 characters are not considered. @@@ Fin de man/wdlen.man echo man/wheel.man cat >man/wheel.man <<'@@@ Fin de man/wheel.man'

NAME

wheel - roll through text a word cluster at a time

SYNOPSIS

wheel  [ +n  -m  -dF  - ]  filename ...
+n: print clusters of n words (default 2)
-m: do not map upper case to lower case
-d: define punctuation set according to file F
- : read standard input instead of files

DESCRIPTION

To analyze syntactic clusters, you can roll wheel through your text, several words at a time. The second word of the initial cluster will become the first word of the following cluster, and so forth. By default, each output line contains a two-word cluster, but with the + option, you can specify any cluster size up to 20. The -m option prevents mapping of words to lower case, and the -d option can be used to specify non-standard punctuation.

After extracting all the word clusters in your text, they can be sorted and counted to find repeated patterns. Here is an example of a command line to accomplish this:

 % wheel +3 text | sort | uniq -c

Of course, sort can be applied to any field desired; ``sort +2'' refers to the third word on each line. It would be good to analyze syntactic clusters of two, three, four, and possibly more words a piece. British scholars use the cumbersome term ``collocation'' to mean word cluster.

AUTHOR

Bill Tuthill

BUGS

@@@ Fin de man/wheel.man echo man/xref.man cat >man/xref.man <<'@@@ Fin de man/xref.man'

NAME

xref - cross reference generator

SYNOPSIS

xref  [ -r -ln -pn -ic -dF - ]  filename ...
-r : reset linenumber to 1 at beginning of every file
-ln: line numbering begins with line n (instead of 1)
-pn: page numbering begins with page n (instead of 1)
-ic: page incrementer is character c (defaults to =)
-wn: width of output page is n (defaults to 80)
-d : define punctuation set according to file F
-  : read text from standard input (terminal or pipe)

DESCRIPTION

Xref is a cross reference generator that lists all distinct words in a text, with the line number (or page number) where they appear. Its output constitutes a simple word index, without the labels or context quoting provided by kwic or kwal. If you want your concordance to give merely the location of certain common words, without any context, you may want to use selected output of xref.

If you are cross referencing a number of short texts, you can reset the linenumber to 1 with the -r option. Line number and page number can be set with the -l and -p options. The default pagination character is the equals sign; if you have another page indicator, it can be set with the -i option. In case your text has equals signs that do not indicate a new page, you could use the -i option without a character afterwards, and page labelling will not occur.

Xref will also read a user-definable punctuation set from the file specified after the -d option. It can also read from standard input. Most importantly, the output width can be set with the -w option. For example, to send a cross reference index to the lineprinter, a -w130 is recommended. The default page width is 80, which is appropriate for a CRT terminal or for regular paper.

FILES

A text is broken into words labelled by line number or page number, and then sent to a tempfile, /tmp/RefXXXXX, where the results are sorted before final formatting. This file is removed in case of interrupt.

AUTHOR

Bill Tuthill

BUGS

In the tempfile, words are separated from line numbers (or page numbers) by a control-b, so if you have this character anywhere in your text, you will get strange results. @@@ Fin de man/xref.man exit 0

Index

NAME
SYNOPSIS
DESCRIPTION
SEE ALSO
AUTHOR
BUGS
NAME
SYNOPSIS
DESCRIPTION
SEE ALSO
AUTHOR
BUGS
NAME
SYNOPSIS
DESCRIPTION
SEE ALSO
AUTHOR
BUGS
NAME
SYNOPSIS
DESCRIPTION
SEE ALSO
AUTHOR
BUGS
NAME
SYNOPSIS
DESCRIPTION
FILES
SEE ALSO
AUTHOR
BUGS
NAME
SYNOPSIS
DESCRIPTION
SEE ALSO
AUTHOR
BUGS
NAME
SYNOPSIS
DESCRIPTION
SEE ALSO
LIMITATIONS
AUTHOR
BUGS
NAME
SYNOPSIS
DESCRIPTION
FILES
SEE ALSO
LIMITATIONS
AUTHOR
BUGS
NAME
SYNOPSIS
DESCRIPTION
SEE ALSO
AUTHOR
BUGS
NAME
SYNOPSIS
DESCRIPTION
SEE ALSO
AUTHOR
BUGS
NAME
SYNOPSIS
DESCRIPTION
SEE ALSO
AUTHOR
BUGS
NAME
SYNOPSIS
DESCRIPTION
SEE ALSO
AUTHOR
BUGS
NAME
SYNOPSIS
DESCRIPTION
SEE ALSO
AUTHOR
BUGS
NAME
SYNOPSIS
DESCRIPTION
SEE ALSO
AUTHOR
BUGS
NAME
SYNOPSIS
DESCRIPTION
FILES
SEE ALSO
AUTHOR
BUGS
NAME
SYNOPSIS
DESCRIPTION
SEE ALSO
AUTHOR
BUGS
NAME
SYNOPSIS
DESCRIPTION
SEE ALSO
AUTHOR
BUGS
NAME
SYNOPSIS
DESCRIPTION
FILES
SEE ALSO
AUTHOR
BUGS
NAME
SYNOPSIS
DESCRIPTION
FILES
SEE ALSO
AUTHOR
BUGS
NAME
SYNOPSIS
DESCRIPTION
SEE ALSO
AUTHOR
BUGS
NAME
SYNOPSIS
DESCRIPTION
SEE ALSO
AUTHOR
BUGS
NAME
SYNOPSIS
DESCRIPTION
FILES
SEE ALSO
AUTHOR
BUGS

This document was created by man2html, using the manual pages.
Time: 03:53:50 GMT, July 02, 2025

INDEX

NAME

SYNOPSIS

DESCRIPTION

SEE ALSO

AUTHOR

BUGS

NAME

SYNOPSIS

DESCRIPTION

SEE ALSO

AUTHOR

BUGS

NAME

SYNOPSIS

DESCRIPTION

SEE ALSO

AUTHOR

BUGS

NAME

SYNOPSIS

DESCRIPTION

SEE ALSO

AUTHOR

BUGS

NAME

SYNOPSIS

DESCRIPTION

FILES

SEE ALSO

AUTHOR

BUGS

NAME

SYNOPSIS

DESCRIPTION

SEE ALSO

AUTHOR

BUGS

NAME

SYNOPSIS

DESCRIPTION

SEE ALSO

LIMITATIONS

AUTHOR

BUGS

NAME

SYNOPSIS

DESCRIPTION

FILES

SEE ALSO

LIMITATIONS

AUTHOR

BUGS

NAME

SYNOPSIS

DESCRIPTION

SEE ALSO

AUTHOR

BUGS

NAME

SYNOPSIS

DESCRIPTION

SEE ALSO

AUTHOR

BUGS

NAME

SYNOPSIS

DESCRIPTION

SEE ALSO

AUTHOR

BUGS

NAME

SYNOPSIS

DESCRIPTION

SEE ALSO

AUTHOR

BUGS

NAME

SYNOPSIS

DESCRIPTION